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ABSTRACT 

This paper describes methods to assess the impact on 
studants of a teacher using skills learned in a training program^ 
Three designs for assessing the effeote of teacher training materials 
are presented* time series design, equivalent time-samples design, 
and poattest-only control group design* Data obtained by classroom 
teachers while using the designs are Included* Some of the 
eonsideratione when selecting appropriate research and evaluation 
designs are discussed in addition to the problems of analysing data 
from the designs. An eight-item bibliography is included* 
(Author/MJM) 



EKLC 



FILMED FROM BEST AVAIUBLE COPY 



o 



AM 



Acquiring Teaching Competencies: 

Reports and Studies 



National Cmt^r for the Dm^Iopmmt of 
IHraimng Matetials in Teacher Education 



SCHOOL OF EDUCATION 
INDIANA UNIVERSTIY 
BLOOMINQTON 



EKLC 



FILMED FROM BEST AVAILABLE COPY 



This series is published and distributed under tht. auspices of the National 
Center for the Dovc'opmQnt of Training Materials in Teacher Education. 
The National Center has been inltiattd and supported by h grant from 
the National Center for the Improvement of PHyGaliona! Systems US 
Office 5f Education, 

The primary objective of this publfcation series is to provide an outlet for 
theoretical, procedural, technical and evaluatlonal reports and studies in 
the development of protocol and training materials, and In their use in the 
acquisition of teaching competencies. 

The editorial advisory board functions primarily to set policy regarding 
directions and purposes of the publication and areas of needed publication. 
Editors for each report will be selected from those listed below. 



AssaGlalei of the Nitional Center 
at Indiana University 

Laurence D. Brown 
David Gllessman 
Gary M, Ingersoll 
W. Howard Levle 
James R. Okey 
Philip G. Smith 
Richard L. Turner 
Jamei D, Waldan 



External Edftqrial 
Board Members 

David Berliner 
Far West Regional 
Educational Laboratory 
Btrktliy, California 

Bryce B. Hudglns 

■ Washington University 
St. Louis Missouri 

B. Othanel Smith 
University of South 
Florida, Tampa, Florida 



Manuicripti for considaratlon should be submittid to: 

Laurence D. Brown 

National Center for the Development 

of Training Matirials in Teacher 

Education 
School of Education 
Indiana Unlveriity 
Bloomlngton, Indiana 47401 



u s. DEPARTMINTOF HiALTH, 
eDUOATiUN & WGLFARE 
OPFiCEQFiDUCATlON 

THIS DOCUMENT HAS BEEN RiPRO* 
DUCED iXACTLV AS RiCEIVED FROM 
THE PERidN OR ORGANIZATION ORIG^ 
INATING IT POINTS OF VIEW OR OPIN- 
IDNS STATED DO NOT NEClBSARlLY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY 



Designs for the Evaluation of 
Teacher Training Materials 

James R, Okey and Jaromi L. Clesla 



Report #2, October, 1972 



ForenmcJ 



A mandated concern of ail ihose individuals angaged In the production of protocol and 
tramlng materials is that of evaluation. The speciffc objective of all the protocol and training ma- 
tirlals projects is to produce materials which have been tested and revised until it can be demon- 
strated that the matiriali are effective. Unfortunately, not all of us are sophlstlcatad in the 
concepts and strategies of evaluation although we may know, or are coming to know, much about 
the strategies by which materials are developed. The authors of this article have both kinds of 
skills and are txceptionally well suited therefore to speak to the specific problems of evaluation 
of the particular kinds of matiriali which concern the National Center for the Development of 
Training Materials in Teacher fducatiqn 

The senior author, Dr. Okey, was one of the original project directors at Indiana University 
with whom the National Center contracted to produce a set of training materials. His project 
consisted of a iilf-initructlonal program entitled TEACHING FOR MASTERY and Is based upon 
Bloom's well-known formulation. The materials focus on the acquisition of skills In the prepara- 
tion and use of diagnostic examinations which provide information for student remedial work. 
After having completed a preliminary set of materials, Dr. Okey proceeded to evaluate them not 
only in terms of the immediate effect on the learner but also In terms of the effects on the 
students gf the learners. The end product is a validated set of materials. In addition, however, 
there Is a by-product in the form of an evaluation procedure which may provide a prototypical 
model for eyaluatlon of othfir such projects. The subject of the article of course is the evaluation 
model. I bilievi It will prove to be helpful for many developers interested in evaluating their 
training materials, ■ 



L. b, Brown, Editor 



Designs for Che Evaluation of 
Teacher Training Materials 

James R, Okey and Jerome L. Cieffla 

Indiana University 

The Intention in this paper will be Co describe methods to assess 
the impact on students of a teacher using skills learned in a training 
program. To accomplish thli , ft program designed to train teachers In 
a particular set cf classroom skills will be describid, Then, designs 
used to assess the effect of teachers using these skills will be given, 
Thus, while the paper describes a particular set of training materials 
and methods for measuring their effectiveness, the intention Is to il- 
lustrate evaluation designs that have wide application for assessing 
the effects of using teaching skills in terms of student outcomes. 
6'/aluatlon Questions 

There are three crucial questions a developer or evaluator of 
teacher training materials needs to ask: 

I. Do teachers attain skills which the miiterlais are designed to teach 
To answer this question requires measurement of whether a training 
program is effective In producing stated performance outcomes. This 
amounts to an internal or intrinsic evaluation (Scriven, 1967) of the 
tralnlPi package. For example, If an objective of a training package 
Is to learn to construct divergent queitlons, a posttest would be given 
to a teacher fQllowlng study of the paukage to assess achievement of 
this skill. It an objective Is to learn to construct evaluation items 
for given objectives, a test administered to anyone studying the pack- 
age would indicate whether or not this skill was acquired. In either 
event, the Important question for the developer is whether the training 



program produces the outcomes specified foy it* Other aspects of inter* 
nai Or iatrinsia evaluation could be used by a developer or evaluator, 
but these shall not be considered here* 

2, Do teachers use skills from the training materials In their clasS'- 
rooms? 

This evaluation is conunonly performed with observation schedules 
or rating forms (e.g., see Amldoii and Hough, 1967)* Observers enter 
the classroom directly or vicariously to record what a teacher does. 
Amount of teacher talk^ frequency of verbal praises or the type and 
number of questions asked may be recorded , depending on whatever skills 
were included in the training program being evaluated, 

3, Does the use of skills by teachers have any effect on student learn- 
ing? 

This question concerns not the training package Itself , but the 
"payoff" for using the skills In it (Scriven, 1967). For example, If 
teachers learn to construct dlagnoitic tests by itudying training ma-^ 
terlals, a payoff evaluation might determine whether uie of this skill 
increased student achievement. If teachers learned to use pralie to 
reward classroom participations the affect of use of pralie by teachers 
on student attitude could be measured. The emphasis in each case Is 
not on acquisition of a skill, but on the effects of using it. 

Each of the three questions posed above is Important. A thorough 
, evaluation of a training program will attend to each one. The Intention 
in this paper, however ^ is to focus on designs to aid in answering the 
third question, whether the use of certain skills by teaehers has any 
payoff in altered student achievement. The reason for focusing on the 



latter question is that little attention has been given to the relation- 
ships between teaching okills and student achievement (cf . Rosenshine 
and Fursti 1971) and to the means of obtaining evidence of these rela- 
tionships. 

Selecting Evaluation Designs 

Campbell and Stanley (1963) describe an eKtenslve set of designs 

for research and evaluation studies. For each design included in their 
work^ they diacuss threats to validity, procedures for organizing groups 
methods of icheduling treatments and measurements, and suggestions for 
analyiing data. Mong the alxteen designs they describe, three are Iden 
tifled ai true experimental designs (Pretest-Posttest Control Group De^ 
sign, Solomon Four-Group Deiign, Posttest-Only Control Group Design) and 
are recoiranended for use when possible* 

Deipita their acknowledged superiority for gathering data to answer 
queationai the three recommended designs of Campbell and Stanley are fre 
quently difficult to usa because each of the designs spaciflii one or 
more control groups. Use of control groups ^ however beneficial for ob» 
tainlng reliable answers to questions ^ is often not praGtical becausei 

a. few subjects (teachers) may be available and dividing a 
small population reduces the number of subjects for mea- 
suring treatment effects, 

b. subjects (teachers) resent placebo treatments or serving 
as members of untreated control groups. 

c. ethical questions arise regarding the use of control groups 
or placebo treatments . 

Payoff evaluation studies, by definition, must be done with teach- 



erg who have itudmts. To find teachers with studenCSj a developer or 
evaluator may go directly to schools to locate volunteers or work through 
in-service claises* These employed teachers may he enticed into trying 
new materials or techniques when tl ey see an advantage to Cheniselves in 
doing so. However, it is difficult to convince teachers with a heavy 
work load and numerous problems for which they desire help that they 
should participate in a study as a member of a control group* 

When it is impossible to use the recommended designs the next best 
procedure can be tried—in this case 5 using what Campbell and Stanley 
call "quasi-experimental'' dasigna. The difference between these and 
true experimental designs lies in the degree to which the eKperlmenter 
has control over arranging treatments, selecting subjects , scheduiing 
observations, and other events which occur during an experiment, Sev= 
eral of Campbell and Stanley's quasi-experimental designs are one^group 
dealgni in which the same teachers act as both experimental and control 
teachers; yet the designs allow a comparison of the effects of uslrig and 
not uiing selected teaching skills. 

In the remainder of this paper three designs taken from Campbell 
and Stanley (1963) will be used to demonstrate how data can be gathered 
for payoff evaluaclon studies while avoiding the problem of setting up 
separate groups of teachers for compariion purpoies. The three designs 
are singled out to illustrate alternative procedures for evaluating the 
effecti of training in a classroom setting* The training package used 
In the studies will be described briefly and will be followed by a dee- 
crlption of each design and the sample data collected when using ^t. 
The Training Materials 

A self -instructional program called Teaching for Mastery (Okey and 



Clesla, 1972) designed to train teachers to implement Bloom*s mastery 
learning strategy (1968) was developed. The materials, which require 
about five hours to complete, consist of tape-slide and paper and pen- 
cil exercises. Frequent oppor tunitios for practice and feedback are 
included and self-tests with answers are available for each of the six 
sections Into which the prograni is divided. A total of 22 outcomes are 
stated in the program that range from sequencing objectives, to con-- 
structing diagnostic tests, to selecting alternative instruction for 
unsuccessful students. 

The overall goal of the training program is to teach teachers to 
implament a five step plan for increasing the achievement of their stu= 
dents. The major skills required to do this are learning to prepare 
and administer diagnostic eKaminations on course objectives at frequent 
intervals^ and then to direct students to remedial work as needed. 

The Teaching for Mastery program was studied by all members of an 
in-service class of 21 elementary school teachers about mid-way through 
a 15 week term* Portions of two class periods ware devoted to indepen- 
dent study of the program with the remalndar done outside of clase* 
Time Series Daslgn 

Campbell and Stanley (1963) diagram the Time Series Design as fol- 
lows I 

h h h h X 03 Og 0^ 03 



The diagram ahows a time sequence of events from 0^ on the left to 
Og on the right. Measurements or observations (0^, 0^, ate,) are made 
at intervals and then a treatment (X) Is introduced. Following the trea 
ment, measurementi (O^, Og, etc.) are continued. This design has been 
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used tc measure suuh things as atcituda changes both praceeding and fol= 
lowing an event such as showing a motion picture on race relations. An- 
other use might be to axamine the number of students that leave school 
before and after setting up a dropout-'preventlon program. 

The Time Series Design is well suited to evaluating the effects of 
tiachers studying and using skills from a training package when a two 
group design is impossible. Multiple measurements before studying the 
package allow pre-treatinent or baseline behavior to be established. Re- 
peated measures after studying the package allow both immediate and long 
term effects to be measured. Using several observations before and after 
a treatment allowi an evaluator to interpret results more confidently 
because transient or spurious effects are more apparent. 

Figure 1 shows data gathered by a first grade teacher using a Time 
Series Design with a class of 24 students. The plotted points represent 
the percentage of children in the class scoring 90% and higher on summa- 
tlve tests in mathematics given at approximately two week intarvals. The 
first three observations were made prior to studying the Teaching for 
msmx materials and,.the last three aftar doing so. Thui , the graph shows 
achievement results for about 12 weeks of Instruction. 
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Figure 1. Student performance before and after teacher 
training. materials are studied. 



The reason for studying the Teaching for Mastery materials was to 
have teachers learn to uae the skills taught in the package and thereby 
increase student achievenient , One measure of this achievement is the 
number of students scoring at a selected level on tests over the objec- 
tives for a unit. In this study the teacher used a 90% criterion level 
if students scored 90 or above on a unit test they were said to have 
mastered the mat.irlal. Other criteria could ^ of course, be used such 
as an 80% criterion or the mean test score for all students. 

The problem of analyElng data fron a Time Series Design is consi- 
derable. If the several observations before and after the treatment 
are the same (e,g., repeated administration of the same attitude mea=- 
sure), problems of comparison are simplified. In this case, however^ 
the observations are dlfferentj six unit tests are given, each cover- 
ing different objectives. To compare the scores Is hazardous because 
objectives from one unit may be more difficult than those from another • 
In this study thu procedure for analysing data from the Time Ser-- 
iei Design was to compare the mean percentage of students achieving the 
90% criterion before and after the treatmerit. These data are given in 
Table 1. Correlated prdportlons should be used for this comparison 
since the same class took a series of six tests. This was not done be- 
cause only a set of scores for the entire class was available for each 
unit, not individual icores for Individual pupils on each unit* Beeauia 
three observations were made both before and after the treatment, the 
number of sub j acts used in calculating the z value is three times the 
number of students in the claai to acGount for the three obiervationi ' 
contained in the mean score. That is, a pre-treatment n of 60 (3 X 20) 
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and a post-treatmeit n of 69 (3 X 23) was used/ The difference between 
the proportions Is also signlf leant (z - 1.99, p ^ .05) when 20 and 23 
subjects are used in the calculations. 



Table 1 

Compflrlson of Pre- and Post-Treatmint Achievement 
in a Tlma Series Design 



Measure n PercantaiB of students Mean 

scoring 90% or higher 
on successive teits 



Pre- 

treatment 60 40 75 57 57.3 

Post- 
treatment 69 83 78 87 82,7 



4.6* 



* p ^ .001 



More aophlstlcated data analyses than shown here are possible when 
using Time Series Designs. In thlB study the proportion of students 
scoring above a certain level on unit teati was selected because this 
was the criterion teacherfl were encouraged to use in the training pro- 
gram. Campbeli and Stanley (1963) treat the problem of comparison of 
observatlena from Time Series Designs at greater length. 
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^ Thi number of atudtnti In the claii fluctuated during the study, The 
avaragt number beferi the triatmint was 20 and after the treatment 
wai 23 • 



I 

i 

Equivalent Timi-Samplas Design 

The Equivalent Time-Samplis Design is diagrammed by Campbell and 
Stanley (1963) as follows i 

V ^0° ^1° ^0° 

A time aequence of events is ihown itarting with treatment on 
the left and prQceedlng to the final observation on the right. This 
design can be thought of as an "on and off** design. A treatment is in- 
troduced (X^) and thin withheld (Xg) j then reintroduced and then with- 
held again, and so on* In other wordii the treatment or experiinental 
variable la turned on and off* After each use or non-use an observa^ 
tlon (0) is made of the behavior being eKamlned* 

The Equivalent Tlme-Samplei Deaign can bi readily U3gd for aasess- 
Ing the power of skills learned in a teacher training package. Suppose 
a teacher learns to use certain questioning ikllli. These skills can 
then alternately be used and not used in successive encounters with stu- 
dents * Students- attitudes or Intellectual achievements under each 
treatment can serve as dependent variables to assess the effectiveness 
of the skills. 

If teaching skills have an effect on atudtnt learning and are al- 
ternately turned on and off in successive units, a saw-tooth type of 
achievement record should result* Wien the skills are In effect student 
achievement should be upi when not used, student achievement should bt* 
dom. Of coursi, a reverse situation would be expected if the ttachlng 
skills were designed to alter a behavior such as frequency of classroom 



Figure 2 ihows the riiults obtalntd by a sixth grade teacher using 
Che Equlvalant Tifne-Samples Design with 29 itudents during four iucceis- 
Ive units In a mathematics class. During the four units, each approxi-- 
mately two weeks in length, the teacher alternately used and did not use 
the skills studied in the Teaching for Maitegy tralnln| materials. 



Pereentage of 
students icDrlng 
80% or higher 
on unit tests 



12 3 4 

Suceesaive Units 

Figure 2. Performanee on unite during which the' 
teacher turns skills on and off. 

Results obtained with the on and off treatment confirmed expecta- 
tions # When the teacher used skills learned in the training program^ 
student aehievtment was up; when skills were not used, student achieve- 
ment fell, Table 2 shows the reeulti of a test for the significance of 
difference In achievement for the two units when the skills were used 
and the two for when they were not. A total of 29 students studied 
each of the four units* Values of n ^ 58 were used in the calculation 
to reflect the two obssrvations under each treatment condition* When 
an n of ^9 la used, a z value of 1*23 (p ^ ,09) Is obtained. 




^ Skills used 

d Skills not used 
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Table 2 



Comparison of Achievement in art Equivalent 
Time-'Samples Design 



Mean 



Treatment n Percentage of studenti ...^^u 

Condition scoring 80% or higher Percentage 

on iucceislve testa 



Skills uied 58 62 66 

Skilla not 

used 58 48 48 



64 

48 



3.4* 



* P ^ .001 

pQitteit-Onlv Control Group Deslm 

The Poatteit-^Only Control Group Design is diagrammed by Campbell 
and Stanley (1963) as follows i 

R X 0 

1 

This design Is an exceilent one to uie when testing the effective- 
ness of teaching skills except for thn difficulty of withholding an ex- 
perimental treatment from a group of taachera. A way around the problem 
of withholding treatments from teachers, however, is to have teachers 
withhold certain trtatments from portions of their students for limited 
periods of time. For eKample, a teacher studies a sit of training ma- 
terials and learns certain skills that are intended to alter student 
behavior. To test the sffectiveness of these skills, the teacher divides 
a elftii, Using the randomization procedure, into two groups. For a short 
period of time, perhaps for two or three weeks, he tsachss om of these 
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groups using the learnad ikills and teaches the other group without using 
them. Both groupi of atudenti pursue the same objectives and are judged 
using the same criteria whether a unit teitp an obse^ation instrument, 
or some other evaluation instrument. Appropriate procedures for Isolate 
ing the groups during study (e*g., sending one group to the library while 
the other group is taught) and avoiding other sourcei of contamination 
(e.g*! alternating the order in which the two groups are taught on suc- 
cessive days) are necessary. 

Table 3 shows data obtained by a third grade teacher using the Post-^ 
test-Only Control Group Design with 26 students during a two week unit 
on fractions* The students were divided at random into two groups and 
taught by using and not using the skills from the Teaching for Mastery 
procjram. Both groups took the same test over th« same sat of 20 objgc- 
tlvei at the end of the unit. 

Table 3 

Scores for students taught while the teacher 
used and did not use Mastery Teaching Skills 



Group n X SD t 

Mastery skills 
not used 13 

Mastery skills 
used 13 



10.2 3.20 
12.8 3.25 



2.0* 



* p ^ *0S 



DlBcusslon 

The first point to be made is that the designs illustrated in this 
paper for measuring the effects of teacher training materials are net 
new designs. They have been described at length by a variety of people 
and have been used eKtensively. They have not, however, been used often 
for measuring teacher training effects. As Roienshlne and Furst (1971) 
point out, there have not been many studies (they report approsimately 
50) in which the relationship between teacher behavior and student achieve 
ment is examined. Even among the itudies reported, most have been corre- 
lational. The studies reported in this paper are experimental and illu= 
strate the use of designs for eKaminlng cause and effect relationships 
between teaching skills and student achievement. 

Another point to be made Is that these desl|ns are not better than 
otheri that might be uied. An investigator or developer of training ma- 
terials should select the design that is posiible to use under the cir^ 
cumitances that exist. For example, a time serlei design^ because it is 
a eingle group design, is probably less ideal than several of the *'two 
group'* designi^ but It is not alwayi possible to constitute several treat^ 
ment groups in a study. If the most that the developer has available to 
him is a single group, he has to select some design that will collect the 
maximum data in that situation. 

Whenever liivestigatlons are carried out it is important to keep in 
mind the audience to whom one wishes to speak. Different information 
may be necessary to demonetrate to different groups the effectiveness of 
a treatment. Classroom teachers and princlpali (who are probably leis 
gophlstidated In atatlitlGal analysis) are likely to be more interestet. 
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in deaarlptive data (of the type shown the graphs in the previoui 
pages) than In an analysis of variance table or the results of a e or 
t test. Persons who have a backgrouud in statistics will be likely to 
require different results to be convincfed of the power of a treatment. 
One can see that both descriptive and infirential statistics play a 
role in communicating the results of an investigation to potential uiars. 

Perhaps too often we have decided that inferential statlitics are 
needed in order to assess the effects of treatments* I£ you look, how- 
ever, at the results that the teacher obtained in the Equivalent Time 
Samples itudy in this paper, you will see a fairly pronounced treatment 
effect between the timei the teacher was using the skills and the times 
she was not. Although this Is fairly dramatic when presented graphically, 
the results are not significantly different when a .05 level of slgni=» 
flcance is used with an n of 29 students. Thus, inferential etatlstics 
may lead one to the conclusion that there was no significant treatment 
effect * while a descrlpclve display of the data leads one to conclude 
the opposite. 

Additional analytic power could have been achieved in these studies 
by salecting appropriate classification variables and blocking on these 
for precision of analyels, for eKample, one could have obtained IQ icoresi 
motivation scores^ or creativity scores from students, and then blocked 
accordingly. Not only would this have given more power to the analysis, 
but it would have allowed identification of Interactions among different 
iub"groupi on elaaslf ieation variables with certain treatments. One 
might find, for example, that high achievement-oriented studenti do best, 
under mastery conditipni or that certain IQ groupi are differentially 
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affected by use of certain teaching skills. In other words, certain aptl- 
tude-treatment interactions could ba identified by selecting appropriate 
classification variables and determining which sub-^groups on these varl^ 
ables interact favorably or unfavorably with certain treatment conditions. 
Throughout this study the investigators had minimal contact with the 
teachers when they were in their claSBrogms. In fact, no visits were made 
to any clasirdome* The only interventJon by the inveitigators was to tell 
the teachers what data to gather, what time intervals to use, dnd what de^ 
sign to folldw* The teachers collected all data and instituted all treat- 
meuti. It should also be pointed out that because of this there was no 
check on the teachers' fidelity regarding use of the skills that they 
learned in the trainlni^ package. For future studies , Qbservatlon instru^ 
ments or rating scales should be developed in the manner of Wurthen (1968) 
to establish the degree to which the teachers incorporate the strategies 
or uae the ikilli that they learned in the training materials in their 
actual classroom work. 

Data from only three teachers froni the In-^servlce class of 21 are 
reported in this paper* Quite obvlouily some of the more iuccessful ones 
are reported. Statistically significant results were obtained by teach- 
ers using each of thi three deilgns although outcomei varied; some teach- 
ers were able to cause highly significant changes in student performance 
and others were nott Most of the teachers in the study ^ however , (more 
than 80%) were able to effect some degree of improved performance with 
their students. Wiat the three studies describe, therefore^ is what cer- 
tain teacheri were able to do after receiving a limited amount of instruct 
tlon from a short piece of training material that was in a preliminary 



phase of development. Data obtainad from the 21 teachers are being used 
to raviie the Teaching for Mastery training package* 

A final comment should be made about the rigor of the studies re- 
ported here, the analysis of data obtained from them, and the confidence 
one can place in the results. Certainly the results obtained in any of 
the studies fall short of a full scale validation of the training program* 
Little control was maintained over the teachers and no measures were made 
of their ability to institute the treatments. Some students of itatistlci 
may quarrel with the data analysis for each of the designs , In particular ^ 
the number of degrees of freedom to use when calculating the z valuei is 
arguable. We have analysed the data using one set of assumptions and made 
a case for doing so* An aiternative and more conservative analyiis is 
also reported* As Cuba (1969) has noted, evaluation studies in a field 
Setting almost invariably fail to meet soma of the criteria for traditional 
research studies. Because this is true for the three studies described ^ 
the confidence in the resulti falls iomewhat short of that obtained from 
a laboratory controlled study, but is a good deal greater than the confi= 
dence one has in an untried instructional program. 

Conclusioti 

Three designs for aiiessing the effects of studying teacher training 
materlali are given along with data obtained by classroom teachers when 
they were used. The teacheri had studied a self -instructional training 
package designed to teach them to use Bloom-s mastery learning strategyi 
Some of the considerations when selecting appropriate research and evalu*^ 
atlpn designs are discussed* Problems of analysing data from the designs 
are also considered. . 
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